Goto

Collaborating Authors

 Chiba Prefecture




Self-driving bus tested on road around Tokyo for first time

The Japan Times

A research team at the University of Tokyo began testing a Level 4 autonomous driving bus on public roads for the first time on Tuesday. Levels of automation are measured between 1 to 5, with Level 1 referring to technologies that assist a driver, such as adaptive cruise control, while Level 5 is complete automation. As part of their research, the team has been testing a Level 2 bus -- partially automated driving -- on the route that connects the university's Kashiwa campus to its nearest station, the Kashiwanoha-campus station, since the end of 2019. In a time of both misinformation and too much information, quality journalism is more crucial than ever. By subscribing, you can help us get the story right.


New California fee targets batteries in PlayStations, power tools and singing cards

Los Angeles Times

Things to Do in L.A. Tap to enable a layout that focuses on the article. An attendee plays the Monster Hunter Wilds video game on the Sony PlayStation 5 Pro console during the Tokyo Game Show 2024 at Makuhari Messe in 2024 in Chiba, Japan. This is read by an automated voice. Please report any issues or inconsistencies here . With the start of the new year, Californians will pay a new fee every time they buy a product with a nonremovable battery -- whether it's a power tool, a PlayStation or even a singing greeting card.


TopiCLEAR: Topic extraction by CLustering Embeddings with Adaptive dimensional Reduction

Fujita, Aoi, Yamamoto, Taichi, Nakayama, Yuri, Kobayashi, Ryota

arXiv.org Artificial Intelligence

Rapid expansion of social media platforms such as X (formerly Twitter), Facebook, and Reddit has enabled large-scale analysis of public perceptions on diverse topics, including social issues, politics, natural disasters, and consumer sentiment. Topic modeling is a widely used approach for uncovering latent themes in text data, typically framed as an unsupervised classification task. However, traditional models, originally designed for longer and more formal documents, struggle with short social media posts due to limited co-occurrence statistics, fragmented semantics, inconsistent spelling, and informal language. To address these challenges, we propose a new method, TopiCLEAR: Topic extraction by CLustering Embeddings with Adaptive dimensional Reduction. Specifically, each text is embedded using Sentence-BERT (SBERT) and provisionally clustered using Gaussian Mixture Models (GMM). The clusters are then refined iteratively using a supervised projection based on linear discriminant analysis, followed by GMM-based clustering until convergence. Notably, our method operates directly on raw text, eliminating the need for preprocessing steps such as stop word removal. We evaluate our approach on four diverse datasets, 20News, AgNewsTitle, Reddit, and TweetTopic, each containing human-labeled topic information. Compared with seven baseline methods, including a recent SBERT-based method and a zero-shot generative AI method, our approach achieves the highest similarity to human-annotated topics, with significant improvements for both social media posts and online news articles. Additionally, qualitative analysis shows that our method produces more interpretable topics, highlighting its potential for applications in social media data and web content analytics.


Generative AI for Self-Adaptive Systems: State of the Art and Research Roadmap

Li, Jialong, Zhang, Mingyue, Li, Nianyu, Weyns, Danny, Jin, Zhi, Tei, Kenji

arXiv.org Artificial Intelligence

Self-adaptive systems (SASs) are designed to handle changes and uncertainties through a feedback loop with four core functionalities: monitoring, analyzing, planning, and execution. Recently, generative artificial intelligence (GenAI), especially the area of large language models, has shown impressive performance in data comprehension and logical reasoning. These capabilities are highly aligned with the functionalities required in SASs, suggesting a strong potential to employ GenAI to enhance SASs. However, the specific benefits and challenges of employing GenAI in SASs remain unclear. Yet, providing a comprehensive understanding of these benefits and challenges is complex due to several reasons: limited publications in the SAS field, the technological and application diversity within SASs, and the rapid evolution of GenAI technologies. To that end, this paper aims to provide researchers and practitioners a comprehensive snapshot that outlines the potential benefits and challenges of employing GenAI's within SAS. Specifically, we gather, filter, and analyze literature from four distinct research fields and organize them into two main categories to potential benefits: (i) enhancements to the autonomy of SASs centered around the specific functions of the MAPE-K feedback loop, and (ii) improvements in the interaction between humans and SASs within human-on-the-loop settings. From our study, we outline a research roadmap that highlights the challenges of integrating GenAI into SASs. The roadmap starts with outlining key research challenges that need to be tackled to exploit the potential for applying GenAI in the field of SAS. The roadmap concludes with a practical reflection, elaborating on current shortcomings of GenAI and proposing possible mitigation strategies.


OmniTFT: Omni Target Forecasting for Vital Signs and Laboratory Result Trajectories in Multi Center ICU Data

Xu, Wanzhe, Dai, Yutong, Yang, Yitao, Loza, Martin, Zhang, Weihang, Cui, Yang, Zeng, Xin, Park, Sung Joon, Nakai, Kenta

arXiv.org Artificial Intelligence

Accurate multivariate time-series prediction of vital signs and laboratory results is crucial for early intervention and precision medicine in intensive care units (ICUs). However, vital signs are often noisy and exhibit rapid fluctuations, while laboratory tests suffer from missing values, measurement lags, and device-specific bias, making integrative forecasting highly challenging. To address these issues, we propose OmniTFT, a deep learning framework that jointly learns and forecasts high-frequency vital signs and sparsely sampled laboratory results based on the Temporal Fusion Transformer (TFT). Specifically, OmniTFT implements four novel strategies to enhance performance: sliding window equalized sampling to balance physiological states, frequency-aware embedding shrinkage to stabilize rare-class representations, hierarchical variable selection to guide model attention toward informative feature clusters, and influence-aligned attention calibration to enhance robustness during abrupt physiological changes. By reducing the reliance on target-specific architectures and extensive feature engineering, OmniTFT enables unified modeling of multiple heterogeneous clinical targets while preserving cross-institutional generalizability. Across forecasting tasks, OmniTFT achieves substantial performance improvement for both vital signs and laboratory results on the MIMIC-III, MIMIC-IV, and eICU datasets. Its attention patterns are interpretable and consistent with known pathophysiology, underscoring its potential utility for quantitative decision support in clinical care.


SAVeD: Semantic Aware Version Discovery

Frenk, Artem, Shraga, Roee

arXiv.org Machine Learning

Our work introduces SAVeD (Semantically Aware Version Detection), a contrastive learning-based framework for identifying versions of structured datasets without relying on metadata, labels, or integration-based assumptions. SAVeD addresses a common challenge in data science of repeated labor due to a difficulty of similar work or transformations on datasets. SAVeD employs a modified SimCLR pipeline, generating augmented table views through random transformations (e.g., row deletion, encoding perturbations). These views are embedded via a custom transformer encoder and contrasted in latent space to optimize semantic similarity. Our model learns to minimize distances between augmented views of the same dataset and maximize those between unrelated tables. We evaluate performance using validation accuracy and separation, defined respectively as the proportion of correctly classified version/non-version pairs on a hold-out set, and the difference between average similarities of versioned and non-versioned tables (defined by a benchmark, and not provided to the model). Our experiments span five canonical datasets from the Semantic Versioning in Databases Benchmark, and demonstrate substantial gains post-training. SAVeD achieves significantly higher accuracy on completely unseen tables in, and a significant boost in separation scores, confirming its capability to distinguish semantically altered versions. Compared to untrained baselines and prior state-of-the-art dataset-discovery methods like Starmie, our custom encoder achieves competitive or superior results.


The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process

Hongyuan Mei, Jason M. Eisner

Neural Information Processing Systems

Many events occur in the world. Some event types are stochastically excited or inhibited--in the sense of having their probabilities elevated or decreased--by patterns in the sequence of previous events. Discovering such patterns can help us predict which type of event will happen next and when. We model streams of discrete events in continuous time, by constructing a neurally self-modulating multivariate point process in which the intensities of multiple event types evolve according to a novel continuous-time LSTM . This generative model allows past events to influence the future in complex and realistic ways, by conditioning future event intensities on the hidden state of a recurrent neural network that has consumed the stream of past events. Our model has desirable qualitative properties. It achieves competitive likelihood and predictive accuracy on real and synthetic datasets, including under missing-data conditions.